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ABSTRACT 

A study was conducted to: (1) determine the simple 

and multiple correlation coefficients between selected 
educational/personal variables and academic achievement at 
intermediate grade levels as measured by the Iowa Tests of Basic 
Skills; (2) determine the multiple linear regression equations for 
predicting individual student achievement as measured by ITBS 
subtests; and (3) cross/ validate the regression equations determined 
ir this investigation. The general method used was the determination 
and cross-validation of multiple linear regression equations for 
predicting achievement of intermediate level children from individual 
and school based data normally available. Variables used were: (1) 

Dependent: vocabulary, reading, language, and arithmetic; (2) 
Independent: achievement level. Intelligence, Sex, Social Mobility, 
Aid for Dependent Children (ADC) , age, race, years in school, and 
learning rate. Sample population consisted of public school children 
in grades five through eight. Data analysis procedure was Step-Up 
Multiple Regression Analysis. The SPSS regression routine was used. 
The highest correlations with post-and pre-achievement scores were on 
the same scale and intelligence. There was a low relation between 
learning rate and achievement. In cross-validation, there were five 
instances of correlations equal to or higher than those originally 
obtained. Variables that emerged as significant predictors in 
multiple regression were: pre- achievement, IQ, Sex, and ADC. 
(Author/CK) 
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Despite the pros and cons of standardized testing, most school systems 
annually administer one or more standardized tests to students# especially 
at the elementary and junior high levels. The realistic interpretation of 
individual and group scores, however, is by no means a simple matter. 

Results of standardized achievement tests are used for tracking students 
by achievement grouping, homogeneous grouping within classrooms, diagnostics 
of students' strengths and weaknesses, and so forth. Occasionally these 
results are used for curriculum and instruction evaluation. However, a number 
of difficulties have arisen from this practice. Among these difficulties is 
the ignoring of basic differences among student groups which affect achieve- 
juQjit but are beyond the control of the teacher. This can be partially alle 
viated by determining the relations of selected variables to achievement and 
taking the variables into consideration when analyzing individual and group 
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achievement test scores. 

The relations between some educational -personal variables and educational 
achievement have been well documented in the literature. Perhaps one of the 
most frequent predictors used is that of intelligence as measured by some form 
of IQ test. Gnauck, Johanna# and Kaczkowski (1961) in testing 180 Milwaukee 
students in 7th and 8th grades found correlations between .56 and .79 when 
comparing Lorge -Thorndike verbal IQ scores with various subsets of the Iowa 
Tests of Basic Skills. Comparisons with the L-T non verbal IQ scores showed 
lower coefficients ranging from .44 to .64. 

In a study involving sixth graders in a rural central school in New 
York State, Churchill and Smith (1966) found the L-T verbal and L-T non-verbal 
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to have correlation coefficients of ,84 and .65 respectively with the composite 
ITBS scores. In this same study, they found that third and sixth grade com- 
posite ITB5 scores had a correlation coefficient of .79 for a longitudinally 
matched sample of 56 students. 

Knief and Stroud (1959) in a study involving 344 students showed values 
nearly identical to those found by Churchill and Smith when comparing L-T ver- 
bal and L-T non-verbal to ITBS composite scores of fourth graders. These 
researchers also found a correlation of .34 between social class (as measured 
by the Warner Index of Status Cnaracteristics) and ITBS scores. Multiple cor- 
relation techniques employed by Knief and Stroud generally showed little in- 
crease in coefficient values over that between L-T verbal and ITBS scores. 

The effect of sex on academic achievement seems considerably less dramatic 
than that of some previously mentioned variables. Parsley et. al (1963) 
found no differences between the sexes in grades two through eight on tests 
of reading-vocabulary, reading comprehension, arithmetic reasoning, arithmetic 
fundamentals, and IQ. However, he did cite other sources who claimed dif- 
ferences between the sexes on similar achievement measures. 

The use of fifth grade ITBS subset scores (reading, language, arithmetic, 
etc.) to predict corresponding eighth grade scores was the basis for a study 
by Dyer, Linn and Patton (1969) in which 9,972 New York students were com- 
pared on these measures. Correlation coefficients be tw en corresponding sub- 
set scores ranged from .73 to .83. A major finding of this study was that 
longitudinal studies of classes which because of mobility are unmatched across 
time and cross-sectional comparisons between two different grade levels of 
students did not provide comparable results to those obtained in the longi- 
tudinal matching of individual students. 

Based on the latter study# the authors did suggest that subsequent studies 
of a similar nature control for the general effect of student mobility on those 
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students who remain in the program. This suggestion is based on their find- 
ings of greater discrepancies between actual and predicted scores for those 

« 

i 

students in schools with high mobility rates. 

Based on these and other research results as well as practical consider- 
ations within the school system, the investigators of this study selected 
several of these variables in addition to a few Others as a basis for pre- 
dicting fifth through eighth grade student ITBS subtest scores. The specific 
foci of this study are summarized in the following section. 



PURPOSE 

The purposes of this study were to (1) determine the 
simple' and multiple correlation coefficients between 
selected educational-personal variables and academic 
achievement at intermediate grade levels as measured 
by the Iowa Tests of Basic Skills; (2) determine the 
multiple linear regression equations for predicting 
individual student achievement as measured by ITBS 
subtests; and (3) cross -validate the regression 
equations determined in this investigation. An addi- 
tional concern of this study was the examination of 
the potential for using aggregates of individual 
results for group predictions. 
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PROCEDURE 



' The general method for this study was the determination and cross- 

y 

validation of multiple linear regression equations for predicting achieve- 
ment of intermediate level children from individual and school based data 
normally available. 

Variables 

Of interest in this study was the prediction of achievement level of 
children from information available in school records. The variables are 
listed belcw. 

Dependent Variables 

1. Vocabulary (Voc2) — vocabulary grade equivalent scores on the 
XTBS# post-test scores. 

2. Reading (Rd2) — composite reading grade equivalent scores on 

the STBS $ post-test scores. 

3. Language (L2) — composite language grade equivalent scores on 
the XTBS# post-test scores. 

4. Arithmetic (Art2) — arithmetic grade equivalent scores on the 

* 

XTBS* * post-test scores. 

Independent Variables 

1. Achievement level — — pre— test scores on respective ITBS subtests. 
These scores were obtained one year prior to the post-test scores. 
Test data were obtained during the rprings of 1970 and 1971. 

2. Intelligence (IQ) — Lorge -Thorndike verbal intelligence test 

* 

scores were obtained at the same time as the pre— test achieve- 



ment scores. 



Independent Variables (cont'd.) 



3. Sex (S) — Sex of student with 0 = Male and 1 = Female. 

4. School Mobility (SM) — Percentage turnover of students as deter- 

mined by the formula! No. transferred in or_out x 100 . 

End of year enrollment 

5. Aid for Dependent Children (ADC) — ADC data on each child was not 
available from school records. However the percent of school 
enrollment from families receiving ADC was easily obtained. 
Therefore the school percentage ADC was taken as the value for 
each child in that school. 

6. Age — The age of the child at time of pre-test was obtained 
from the children when they took the Lorge- Thorndike Intelligence 
Test. 

7. Race — The race of individual students was not available. 

However, a racial count reported as the percentage of Caucasians 
was obtainable for each school. This school data was entered for 
each student. 

8. Years in School (YS) — The number of years the child had been in 
school at the time of pre-test. 

9. Learning Rate (Rate) — The rate of achievement growth as deter- 

mined by the formula i Rate - Pre-teat achievement level . 

YS + 1 



Sample 

The population from which the samples were drawn was all the public school 
/children in grades five through eight in the St. Louis City Public School 
System. For each grade level a master computer tape containing all the 
needed data on each child was generated from the data available through the 



system's Data Processing Center and Division of Research and Evaluation. 

At that time the data tapes were edited so that students with partial 
information were discarded. Approximately 20 percent of the population 
was lost at this stage. 

Two 25 percent samples of subjects were drawn from each edited data 
tape and written on separate tapes. The process for selection was that 
for one tape every fourth student was selected starting with the first 
student and for the other tape every fourth student was taken starting 
with the second student. This procedure resulted in samples with complete 
data of the following sizes: grade five, 1680; grade six, 1620; grade 

seven, 1680; and grade eight, 1432. 

At each grade level one of the tapes was used for data analysis and 

the other tape was used for cross-validation. 

Data Analysis 

As noted previously, the purpose of this study was to determine the 
best set of predictors for school achievement in grades five through eight. 
Data were obtained for samples of about 1500 students. The data analysis j 
procedure was Step-Up Multiple Regression Analysis. The SPSS regression / 
routine was used. 

The data analysis consisted of two steps. In the first step all 
independent variables, with the exception of learning rate were run 
against post-test achievement scores. This consisted of four runs at each 
grade level, one run each for Vocabulary, Reading, language, and Arithmetic. 
In total, 16 runs were made. At this point, the relations exhibited were 
examined to determine the subset of variables which moct consistently aided 
in prediction. 
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After identification of the best set of predictors, the analyses 
were repeated using only the identified independent variables. In a 
few cases in the initial analyses variables that were eliminated from 
further consideration had loaded in equations prior to some of the 
retained variables. Therefore it was necessary to run the step-up 
regression analyses a second time using only the final set of variables 
to determine the actual contribution of each variable with respect to 
the other variables being used. 

Cross-Validation 

The regression equations were validated using the second sample of 
students at each grade level. Cross-validation took three forms* relation 
between predicted and actual scores? significance of differences between 
mean predicted and actual scores for subsamples of students? and significance 
of differences between distributions of predicted and actual scores for 
subsamples of students. 

The first of these consisted of determining the product-moment 
correlations between predicted scores and actual scores for each of the 16 
equations. The standard errors of estimate were determined using the fol- 
lowing formula. 

Standard Error = 4 Total Variance - Predicted Variance 

One of the primary concerns for this study was the development of 
equations which could be used to determine at the beginning of a school 
year the achievement levels which could be expected in a given classroom at 
the close of the school year. The equations were based on individual student 
and school data. Predictions could then be made for individual students 
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and aggregated for the students in a given class. Therefore of interest 
in the cross-validation was estimates of the congruence of predicted and 
actual scores for simulated class groups . Congruence was determined by 
similarities in means and form of distribution. Significant differences 
between means were tested using t- tests and between distributions using 
Chi-squares. The significance level was set at .05. The simulated class- 
room groups consisted of 35 students selected from the cross-validation 
tapes. Fifteen of these samples were selected and analyzed for eacn of 



the 16 equations. 



RESULTS 



When calculating the step-wise regression, the simple correlations 
between the independent variables and each dependent variable were obtained. 

These correlations are presented in Table 1. In a few instances, jio 

TABLE 1 ABOUT HERE 

correlation is presented in the table. In these cases the variables added so 
little to the predictions that they did not load into the equations and no 
simple correlation was obtained from the computer program. However inter- 
pretations are still possible from the general levels of the coefficients m 

the classes. 
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The highest correlations with post-achievement were pre-achievement 
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scores on the same scale and intelligence, with median correlations of .7767 
■and .6543, respectively. Interestingly the only other two variables which 
demonstrated even a moderate relation were ADC and racial count of the school, 
with median correlations of .3054 and .2967, respectively. 

The relatively low relations between learning rate and achievement are 
worth noting since the learning rate is a commonly used statistic. The median 
correlation was .1495, accounting for only about 2 percent of the variability 
in achievement scores. 

The results of the initial regression analyses are presented in 
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Table 2. R re- achievement and intelligence were the first two 

TABLE 2 ABOUT HERE 

variables to load into the multiple-regression equations for predicting 
post-achievement with median correlations of .7767 and .7898, respectively. 

'The median loading orders for the remaining variables were as follows; 

Sex, 4; School Mobility, 4.5; ADC, 5; Race, 5; Age, 6; and Years in School, 

7.5. The first four of these variables tended to be highly similar in 
loading order, with sex loading most often as variable number three. Years 

in school was usually the last variable to load. 

The pre-achievement scores accounted for about 60 percent of the 
variance. Intelligence generally picked up about an additional 1 percent, 
and any of the remaining variables less than one percent. As these figures 
indicated, very little predictive efficiency was added after the second 
variable. Statistically significant additions to the equations were usually 

found for the first three or four variables. 

For further analysis, it was decided by the researchers to examine the 
four independent variables which would provide the best equations and for 
which the data could be easily and rapidly obtained. The first two variables 
were pre-achievement and intelligence. The other two variables were sex and 

ADC. 

Sex was selected as one of the final set of variables since it was the 
one that most often loaded third in the regression equations. ADC was 
selected over race and school mobility even though it loaded slightly higher(5) 
than the latter (4.5) and about the same as the former (5). ADC was deemed the most 
appropriate variable since it was the simplest and most economical of the 
three measures to obtain. Furthermore, for utilization in a city school 
system it was important to include a poverty index in any system predicting 



success. 



ADC served this function. 

Learning rate was added as the fifth variable to this final set. 
aus the final set of predictor variables included the cognitive variables 
of achievement level, ability, and learning rate, sex of the student, and 

ADC level of the school. 

The results of the final regression analyses are presented in Table 3. 

. table 3 ABOUT HEBE 

Examination of the table indicates that the variables generally loaded in 
the following order: pre-achievement, intelligence, sex, ADC, and learning 

rate. Learning rate loaded significantly ( a - .10) on only three analyses 
and even in these instances it was the fourth variable, adding very little 
to the prediction efficiency. Therefore, the variable was eliminated from 

further consideration. 

For the remaining variables, the criterion for inclusion into equations 
was a regression coefficient significantly different from zero as determined 
by the analysis of variance at a .10 confidence level. Thus, an F-value of 
at least 2.71 with l/o. degrees of freedom was required for a variable to be 
included in a regression equation. The final regression equations were as 

follows# 

i 

Fifth Grade 

VOC2 - .63182 (VOCI) ♦ .02761 (IQ) - .00375 (ACC) - .20406 (S) + .18109 
Hd2 - .53769 (Fdl) + .02771 (IQ) - .00294 (ADC) + .50885 
L2 - .70583 (LI) + .02200 (IQ) + .13306 (S) + .29268 
Art2 - .59338 (Artl) + .02329 (IQ) - .00162 (ADC) + .61040 



10 



10 



Sixth Grade 



Voc2 « .62483 (Vocl) - .00498 (ADC) + .03039 (IQ) - .25166 (S) + .04537 
Rd2 * .58858 (Rdl) + .03126 (IQ) - .00265 (ADC) + .04957 
L2 » .83019 (LI) + .01735 (IQ) + .09640 (S) + .12835 
Art2 « .72892 (Artl) + .02099 (IQ) + .30606 

Seventh Grade 

Voc2 « .60383 (Vocl) + .03744 (IQ) - .06501 

Rd2 « .63257 (Rdl) + .03467 (IQ) - .00510 (ADC) + .09338 

L2 = .88087 (LI) + .01810 (IQ) + .23276 (S) + .12270 

Art2 » .78030 (Artl) + .01924 (IQ) - .00282 (ADC) + .14655 (S) + .81162 
Eighth Grade 

Voc2 = .52831 (Vocl) + .03763 (IQ) + .01095 (ADC) + .82748 

Rd2 » .60646 (Rdl) + .03757 (IQ) - .00218 (ADC) ♦ .12794 (S) + .07078 

L2 « .77128 (LI) + .02112 (IQ) + .303134 (S) + .70806 • 

Art2 a .74545 (Artl) + .01603 (IQ) 1.46276 

« 

CROSS-VALIDATION 

The regression equations were validated using non-overlapping samples 
of students drawn from the same populations as the original data producing 
samples* The cross-validation samples consisted of 1680# 1620# 1680# and 
1432 students from grades five through eight# respectiv «y. 

first set of analyses was the determination of the correlations 
between predicted and actual scores and the standard errors of estimate. 

The results of these analyses are presented in Table 4. The cross-validation 

TABUS 4 ABOUT HERE 




correlations are just about as high as the original ones. In five instances 
the correlations were equal to or higher than those originally obtained. 

Even though the correlations were relatively high, considerable error was 
present in individual predictions. Most o" the standard errors were in 
the .80*s and .90* s. Thus the 68 percent confidence interval would have 



a range of over 1.5 grade equivalents . 

Of particular interest was the utilization of the data to predict 
achievement levels for specific classes, buildings, special learning 
groups, or other aggregates of students. Thus, a cross-validation concern 
was an es.tiraate of the error when using aggregated scores. 

For this segment of the study 15 subsamples of 35 students were drawn 
from each cross-validation tape. Thirty-five students were chosen because 
this number is similar to the number of students that might be expecied to 
be in a class at the intermediate school level. Two types of statistical 
analyses were done on each subsample. The first analysis was the dependent 
samples t-test to determine if significant differences could be expected 
between mean predicted and actual test scores. The second analysis was 
Chi-square to determine if the actual score distributions could be 
expected to be significantly different from the predicted score distributions. 
For this latter set of analyses the test scores were placed in frequency 
distributions of five classes with the middle intervals .5 points in width. 
For all tests, the significance level was set at .05. A summary of these 



analyses is presented in Table 5, 



TABLE 5 ABOUT HEBE 



None of the median t-values were significant. To one-tenth a grade 
levdl, there were no differences between the predicted and observed means for 
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Similar results were obtained when testing 



10 of the 16 median t-tests. 
the significance of the distributions. None of the median Chi-squares were 

significant. These results indicate that when aggregating scores for groups 
of 35 students the actual and predicted distributions could be expected to 
be highly similar with little if any differences between the mean grade 
equivalents . 

However further examination of Table 5 indicates that there was 
considerable variability of results in that 16.25 percent of the t's were 
significant and 24.17 percent of the Chi-squareS were significant. Some of 
this variability may be inherent in the statistical techniques used in that 
estimates of error were determined for each group separately even though 
they were drawn from the same population of students. The use of a common 
estimate of error may have reduced the number of significant tests. 
Nevertheless, no systematic errors were noted in that the median t-values 
were always close to zero, alternating about equally between plus and minus? 
and major differences in the distributions were the under prediction of 
extreme values, a situation inherent in the utilization of the regression 

model* 

Of primary concern in these analyses was the estimation of standard 
errors of the differences between means for the aggregate groups. For the 
median t's, these standard errors ran from a low of .096 of a grade equivalent 
to .320 grade equivalent. The median value was .149. 

DISCUSSION 

The primary purpose of this investigation was to determine and cross- 
validate regression equations for predicting ITBS achievement test scores 



for students in grades five through eight. The independent variables were 
(1) Pre-achievement scores, (2) IQ, (3) Sex, (4) School Mobility, (5) ADC 
level for school, (6) Age, (7) Racial makeup of school, (8) Years in school, 

and (9) Learning rate. 

The largest single correlate of post- achievement scores was pre-achieve- 
ment with a median correlation of .7767. The second highest correlate was 
IQ with a median correlation of .6543. The two school characteristics of 
ADC and race were the only other variables which related even moderately 
with post -achievement scores. 

This latter result was particularly interesting since the variables 
were fairly gross measures based on school data rather than individual 
student information. These results indicate that the poverty level of the 
school as reflected in its ADC percentage and the racial makeup as determined 
by the percent Caucasian in the school is moderately related to achievement. 

Thus, this factor needs to be taken into consideration when revising 
curriculum, planning teaching strategies, predicting student achievement, 
and the like. However, whether or not differences in poverty or race caused 
achievement differences or whether the variables were commonly related to 
other variables was not determined in this study. Non-the-less , it seems 
logical that variables such as IQ might have this commonality. 

The multiple correlations when predicting post-achieve tent scores were 
quite high; with only one of the 16 being below .70. The obtained 
correlations were about equally split between the .70*s and .80*s. The 
cross-validation correlations between obtained and observed scores tended 
to be just slightly lower than the original correlations. Furthermore, 
t-tests and Chi-squares run on subsamples of students indicated that similarities 
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between means of predicted and actual scores and similarity between score 
distributions could be expected. 

The variables that emerged as significant predictors in the multiple 
regression equations were (1) pre-achievement/ (2) IQ» (3) sex# and 
(4) ADC. Pre-achievement and IQ loaded as the first two variables in 
every equation. ADC and sex each loaded on eight equations. This latter 
result was particularly interesting since sex demonstrated only low simple 
correlations with post -achievement. 
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* Number in parenthesis denotes the order in which the variable was added to the regression equation. 

# F is the F-test value for significance of a variable at the point it first entered the equation. 
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multipie correlations of itbs criterion scores with successive subsets of predictor variables 



*D « Mean absolute difference between predicted and obtained 
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